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Abstract 

04 ■ We study the quantum complexity class QNC" of quantum operations implementable exactly 

by constant-depth polynomial-size quantum circuits with unbounded fan-out gates (called QNCf 
circuits). Our main result is that the quantum OR operation is in QNCf, which is an affirmative 
answer to the question of H0yer and Spalek. In sharp contrast to the strict hierarchy of the classical 
complexity classes: NC" C AC° C TC°, our result with H0yer and Spalek's one implies the collapse of 
I the hierarchy of the corresponding quantum ones: QNC" = QAC" = QTCf . Then, we show that there 

exists a constant-depth subquadratic-size quantum circuit for the quantum threshold operation. 
This implies the size difference between the QNCf and QTCf circuits for implementing the same 
quantum operation. Lastly, we show that, if the quantum Fourier transform modulo a prime is 
' in QNCf , there exists a polynomial-time exact classical algorithm for a discrete logarithm problem 

using a QNCf oracle. This implies that, under a plausible assumption, there exists a classically hard 
I problem that is solvable exactly by a QNCj? circuit with gates for the quantum Fourier transform. 

Q-i! 1 Introduction and Summary of Results 

^ . Quantum computers are expected to solve some problems much faster than classical computers (e.g. 
J> I Shor's factoring algorithm [21j). It is, however, still difficult to realize a quantum computer that can 
■ perform quantum algorithms for a reasonably large input size. A major obstacle to realizing a quantum 
. computer is that, even if we can prepare many qubits, we can use them only for a short time due to 
I the coherence time. In order to use such fragile qubits effectively, it is important to understand the 
CNJ ■ possibilities and limitations of using them. This motivates us to study the computational power of 
quantum circuits with a small amount of computation time [HI [12l [lOl [151 El [21 E]. 

In this paper, we focus on the theoretical analysis of the computational power of constant-depth 
polynomial-size quantum circuits, which allows us to analyze that of polylogarithmic-depth ones. The 
elementary gates are one-qubit, CNOT, and unbounded fan-out gates. The unbounded fan-out gate is 
r> ! an analog of the classical one normally assumed to be an elementary gate for the theoretical study of 
I classical circuits ^24j|. The gate on n-|-l qubits makes n copies of a classical source bit in a superposition 
and, in particular, the gate on two qubits is a CNOT gate. It is theoretically interesting to deal with 
the gate as an elementary gate since the use of the gate clarifies many differences between quantum 
and classical circuits \12\ [T5] and connects the quantum circuit model with the one-way model [6j . 

There are three important settings for studying constant-depth classical circuits. All the settings 
allow the use of (classical) unbounded fan-out gates. The first setting deals with constant-depth 
polynomial-size classical circuits consisting of NOT gates and OR and AND gates with bounded fan-in. 
The classical complexity class NC'^ is the class of problems solvable by (uniform families of) the classical 
circuits in the setting. The second setting is the first one augmented with OR and AND gates with 
unbounded fan-in, which defines the class AC''. The third setting is the second one augmented with 
threshold gates with unbounded fan-in, which defines the class TC''. The threshold gate implements 
the threshold function that outputs the bit representing whether the Hamming weight of the input is 
less than a pre-determined threshold. These classes form a strict hierarchy: NC'^ C AC'' C TC" |1 11124). 

Some authors consider the quantum counterparts of the above settings [Ml [121 US]- Although it 
is difficult to determine what the correct counterparts are, we regard the following settings as the 
counterparts [12j, where all the settings allow the use of unbounded fan-out gates. The first setting 
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deals with constant-depth polynomial-size quantum circuits consisting of one-qubit and CNOT gates. 
The quantum complexity class QNCf , which corresponds to NC*^, is the class of quantum operations 
implementable exactly by (uniform families of) the quantum circuits in the setting (called QNC^ 
circuits). The second setting is the first one augmented with a quantum version of OR gates with 
unbounded fan- in, which defines the class QAC^ , corresponding to AC*^. The third setting is the second 
one augmented with a quantum version of threshold gates with unbounded fan- in, which defines the 
class QTC^, corresponding to TC°. It holds that QNC^ C QAC^ = QTC^ [15j. 

First, in order to study the relationship between QNC^ and QACf , we consider the question posed 
by H0yer and Spalek [15] as to whether an 0(l)-depth poly(n)-size quantum circuit can be constructed 
for the quantum operation 0R„, which computes the OR function on n bits. They showed that there 
exists an 0(log* n)-depth 0(n log n)-size quantum circuit. It is a repetition of the OR reduction, which 
is represented as an 0(l)-depth circuit that exactly reduces the computation of the OR function on n 
bits to that on O(logn) bits. Based on their work, we give an affirmative answer to the question: 

Theorem 1 There exists an 0{l)-depth 0(n log n^-size Quantum circuit fov OR^^. 

Theorem 1 immediately implies that 0R„ is in QNCj* and thus QNCf = QAC^ . Since QAC^ = QTC^ as 
described above, the hierarchy of QNC^, QAC^, and QTC^ collapses, i.e., QNC^ = QAC° = QTC". This 
is a sharp contrast to the strict hierarchy of the corresponding classical classes: NC'^ C AC'^ C TC*^. 
More generally. Theorem 1 with H0yer and Spalek's result immediately implies that the hierarchy of 
polylogarithmic-depth exact quantum circuits collapses, i.e., QNC^ = QACf = QTC^ for any integer 
A; > 0, where QNC^ , QACf, and QTC^ are defined similarly to QNCf , QACf, and QTCf, respectively, 
except that they deal with 0(log^ n)-depth circuits in place of 0(l)-depth ones. 

Our idea for constructing the circuit is that, after we apply Il0yer and Spalek's OR reduction, 
we compute the OR function on O(logn) bits in depth 0(1) and with size exponential in logn. The 
exponential-size circuit is based on the representation of the OR function as an M-linear combination of 
exponentially many parity functions. The proof of Theorem 1 depends on the fact that, in the QNCf 
circuit, an unbounded fan-out gate can be used as a parity gate [12], which implements the parity 
function. We note, however, that the relationship QNCf = QACf cannot be derived only from the 
computational power of parity gates in the corresponding classical circuit, i.e., in the NC" circuit. This 
is because, even if the parity gates are allowed in the NC*^ circuit, the OR function is not in NC'' |15j . 

Second, we apply Theorem 1 to studying the relationship between QNCf and QTCf in detail. 
To do this, we consider the problem of constructing an 0(l)-depth small-size quantum circuit for the 
quantum threshold operation TH^, which computes the threshold function with a threshold t on n bits. 
Theorem 1 simply yields an 0(l)-depth 0(tnlogn)-size quantum circuit for TH^ with 1 < t < \n/2'] 
and an 0(l)-depth 0((n— f-|-l)nlogn)-size circuit with \n/2\ < t < n. We show that, using Theorem 1, 
for any t such that the minimum of t and n — t is non-constant, there exists a smaller circuit: 

Theorem 2 There exist the following 0(1) -depth quantum circuits for TH^.' 

• An 0{n\ogn)-size circuit for any 1 < t < logn or n — logn <t<n. 

• An 0{n\Jt logn) -size circuit for any logn <t< [n/2] . 

• An 0{n\J (n — t) logn) -size circuit for any [n/2] <t<n — logn. 

Theorem 2 implies the size difference between the QNCf and QTCf circuits for implementing the same 
quantum operation. Let Un be a quantum operation on n qubits. Let us assume that we have an 
optimal-size QTCf circuit for C/„ and its size is represented by some polynomial s(n). Similarly, let 
t{n) {> s(n)) be the optimal QNCf circuit size. The definition of QNCf only implies that t{n) is 
bounded above by poly(n). Theorem 2 tells us more about this: t{n) is 0{s{rl)^/s{n)logn). This is 
because we can obtain an 0(s(n)Y^s(n) logn)-size QNCf circuit for Un by transforming every threshold 
gate in the optimal-size QTCf circuit into the QNCf circuit by Theorem 2. 

A key ingredient of the circuits in Theorem 2 is an 0(l)-depth 0(n^ )-size quantum circuit for 
the quantum counting operation, which computes the counting function on n bits that outputs the 
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binary representation of the Hamming weight of the input. Our idea for constructing the circuit is 
that, after we apply H0yer and Spalek's OR reduction, we implement a particular type of the quantum 
Fourier transform (QFT) on O(logn) qubits in depth 0(1) and with size exponential in logn. The 
QFT part performs many projective measurements in parallel and applies the circuit in Theorem 1 to 
the classical outcomes of the measurements to estimate the phase of a Fourier state. It is similar to 
the 0(logn)-depth 0(n log n)-size quantum circuit for approximating the QFT on n qubits [8]. The 
main difference is that the QFT part requires exponentially more gates than those in [8j to construct 
an 0(l)-depth exact circuit. Nevertheless, the size is still poly(n) since the input size is O(logn). 

Lastly, we apply Theorem 1 to studying the relationship between QNCf and efficient classical 
computation. More concretely, based on Theorem 1, we study the existence of a classically hard 
proble nfl that is solvable exactly by a QNCf circuit, where a problem is said to be classically hard if it 
cannot be solved by a polynomial-time bounded-error classical algorithm. To do this, we consider the 
question of whether a polynomial-time exact classical algorithm using a QNC^ oracle can be constructed 
for a discrete logarithm problem (DLP) that seems classically hard. Here, the QNC^ oracle solves, in 
classical constant time, a problem that is solvable exactly by a QNCf circuit. Such an algorithm 
for the DLP implies the existence of the desired problem under the plausible assumption that the 
DLP is classically hard. This is because the algorithm with a polynomial-time bounded-error classical 
simulation of the QNC^ oracle would imply that the DLP is not classically hard. 

Based on Shor's bounded-error quantum algorithm for the general DLP ^21j . H0yer and Spalek 
showed that there exists a polynomial-time bounded-error classical algorithm using a bounded-error 
version of the QNC^ oracle [I5j. It is, however, difficult to directly transform the algorithm into an 
exact one. Based on van Dam's exact quantum algorithm for the general DLP j23] , which is simpler 
than Mosca and Zalka's ^ITj, we show that, using Theorem 1, under an assumption about the QFT, 
there exists the desired algorithm for a particular type of the DLP that seems classically hard: 

Theorem 3 Let q be a safe prime, i.e., a prime of the form 2p-\-l for some prime p, and n = [log q \ . 
If the QFT modulo p is in QNCf , there exists a poly{n)-time exact classical algorithm for the DLP over 
the multiplicative group of integers modulo q using the QNC^ oracle. 

We note that, as in the cryptographic literature, we assume that there exist infinitely many safe primes. 
Since we require the assumption about the QFT, Theorem 3 does not imply the existence of the above- 
mentioned problem (under a plausible assumption). It, however, allows us to deepen our understanding 
of the relationship among QNCf , the QFT, and efficient classical computation. In fact, it implies that, 
under the plausible assumption that the DLP in Theorem 3 is classically hard, there exists a classically 
hard problem that is solvable exactly by a QNCf circuit with gates for the QFT modulo p. 

Theorem 3 suggests the following key problem for further understanding the relationship between 
QNCf and efficient classical or quantum computation: Is the QFT modulo p in QNCf? If this is 
the case. Theorem 3 implies the existence of a classically hard problem that is solvable exactly by a 
QNCf circuit (under a plausible assumption). If not, QNCf is strictly weaker than efficient quantum 
computation, more precisely, it is strictly contained in the class of quantum operations implementable 
approximately (or even exactly) by polynomial-size quantum circuits. This is because the QFT modulo 
p is in the latter class [171 [131 [15] . We leave the problem about the QFT modulo p as an open problem. 

The main components of (a slightly modified version of) van Dam's algorithm for the DLP are the 
QFT modulo p, arithmetic operations such as modular exponentiation, and an amplitude amplification 
procedure [5]. Our rigorous analysis of the algorithm shows that these components excluding the QFT 
can be implemented by using the OR functions and iterated multiplications with values pre-computed 
by polynomial-time exact classical algorithms. This analysis with Theorem 1 implies Theorem 3. 

The remainder of this paper is organized as follows. In Section 2, we give some definitions and the 
idea of the OR reduction to describe our results precisely. In Sections 3 and 4, we describe the circuits 
in Theorems 1 and 2, respectively. In Section 5, we describe the algorithm in Theorem 3. In Section 6, 
we give some open problems. Most of the proofs are given in Appendix A. 

'^We deal with not only a decision problem, but also a relation problem, where a relation problem can have many valid 
(polynomial- length) outputs for an input. An algorithm for solving such a problem outputs any one of them [l]. 
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2 Preliminaries 



2.1 Quantum Circuits and Complexity Classes 

We use the standard notation for quantum states and the standard diagrams for quantum circuits [18] . 
A quantum circuit consists of elementary gates, where the elementary gates are one-qubit, CNOT, 
and unbounded fan-out gates (unless otherwise stated). An unbounded fan-out gate on /c + 1 qubits 
implements the quantum operation defined as 



where y,Xj G {0,1}, A; > 1, and ® denotes addition modulo 2. The first input qubit, i.e., the qubit 
in state \y), is called the control qubit. When k = 1, the gate is a CNOT gate. Since an unbounded 
fan-out gate makes copies of a classical source bit, we may say "copy" when we apply this gate. The 
complexity measures of a quantum circuit are its size and depth. The size of a quantum circuit is 
defined as the total size of all elementary gates in it, where the size of an elementary gate is defined as 
the number of qubits affected by the gate. The depth of a quantum circuit is defined as follows. Input 
qubits are considered to have depth 0. For each gate G, the depth of G is equal to 1 plus the maximal 
depth of a gate on which G depends. The depth of a quantum circuit is defined as the maximal depth 
of a gate in it. Intuitively, the depth is the number of layers in the circuit, where a layer consists of 
gates that can be applied in parallel. A quantum circuit can use ancillary qubits initialized to |0). 

For any a = oq • • • a„_i G {0, 1}" \ {0"}, the parity function with value a on n bits, denoted as PA^, 
is defined as PA^(x) = ©"=o«ja;j, where x = xo---Xn-i G {0,1}". We denote PA^" as PA„. For 
example, PA2'^(x) = xq, PA2"'^(x) = xi, and PA2^(x) = PA2(x) = xq (B x\. For any integer 1 < t < n, 
the threshold function with a threshold t on n bits, denoted as TH^, is defined as TII^(a;) = 1 if > t 
and otherwise, where x = xq • • • G {0, 1}" and |x| = X]j=o ^i' Hamming weight of x. The 
OR function on n bits, denoted as OR^, is defined as TH^. The AND function on n bits, denoted as 
AND„, is defined as TH". For any integer 1 < t < n, the exact function with value t on n bits, denoted 
as EX^, is defined similarly to TH^ except that |x| > t in the definition of TH^ is replaced with |x| =t. 
The function EX^ is defined as the negation of OR^. The quantum operation for computing PA^ is 
defined as 



where Xj^z G {0, 1} and x = xq • • ■ Xn-\. For simplicity, this operation is also denoted as PA^. The 
quantum operations TH^, 0R„, AND„, and EX^ are defined similarly. For any integer m > 0, the 
quantum Fourier transform modulo m, denoted as F^, is the quantum operation on [logm] qubits 
defined as |x) i— )• 'Y^=^ ^^\v)^ where < x < m — 1 and oJm = e^'^^l^ . 

The quantum complexity class QNCf* is the class of quantum operations implementable exactly 
by (uniform families of) constant-depth polynomial-size quantum circuits consisting of the elementary 
gates described above. The definition of QACf is the same as that of QNCf except that quantum 
circuits can use a gate for ORfc as an elementary gate for any k bounded above by an arbitrary poly(n) 
for input length n. The definition of QTCf is the same as that of QACf except that quantum circuits 
can use a gate for TII|. as an elementary gate for any k bounded above by an arbitrary poly(n) and 
\ < t < k. Although some authors assume that quantum circuits can use only a bounded number 
of distinct one-qubit gates [I5j, we do not assume this since we consider the exact setting. Thus, the 
complexity classes in this paper are equal to or larger than those in the papers that considered only 
a bounded number of distinct one-qubit gates. We note, however, that one-qubit gates used in our 
circuits are only Hadamard gates H and Z{-^'k/2^) gates for any integer > 0, where, for any G M, 
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Figure 1: The quantum circuit for preparing \ip2) when n = 4. The gate next to the Hadamard gate 
is an unbounded fan-out gate on four qubits, where the top qubit is the control qubit. The gate 
represented as "2" is a Z{tt/2^) gate. 



2.2 H0yer and Spalek's OR Reduction 

The OR reduction is described as an 0(l)-depth 0(n log n)-size quantum circuit for exactly reducing 
the problem of computing 0R„ to that of computing ORm, where m = [log(n+l)] . We explain the idea 
of the circuit, which will be used in our circuits. We want to compute 0R„ and let \x) = |xo) • • • \xn-i) 
be an input state, where Xj E {0, 1}. The circuit outputs the m-qubit state (S>T=o ^IVk), where 

for any < < m — 1. If |x| = |j;o " " " 2^n-i| = 0, H\ipk) = |0) for any < k < m — 1 and thus the 
output state is |0)®™. If |x| > 1, there exist 0<a<m-land6>0 such that = 2'^(26 + 1). A 
direct calculation shows that H\ipa) = |1) and thus the output state is orthogonal to lO)®™". Therefore, 
the circuit exactly reduces the problem of computing 0R„ to that of computing ORm- For any 
< A; < m — 1, \ipk) can be prepared by an 0(l)-depth 0(n)-size quantum circuit as depicted in Fig. [TJ 
By using unbounded fan-out gates, all the states \ipk) can be prepared in parallel and thus the depth 
and size of the circuit for the OR reduction are 0(1) and 0{nm) = 0(n log n), respectively. 



3 Circuit for the OR Function 

3.1 Exponential-Size Circuit 

For any Boolean function /„ : {0, 1}" — ?■ {0, 1} satisfying /n(0") = 0, there exists a set of real numbers 
{'^a}ae{o,i}"\{0"} such that 

fn{x) = ra^Ki^) 
ae{0,l}"\{0"} 

for any x G {0, 1}". This is shown by using the Fourier expansion of [19], more precisely, by replacing 
the Fourier basis in the Fourier expansion of /„ with a basis consisting of the parity functions PAJ^. 
In particular, the following representation of 0R„ can be obtained by using the Fourier expansion of 
0R„. The proof is given in Appendix A.l. 

Lemma 1 For any x £ {0, 1}", 0R„(x) = I]ae{o,i}"\{0"} ^^ni^)- 

The representation of 0R„ implies an 0(l)-depth 0(n2")-size quantum circuit for 0R„. The idea 
is that, when the input x is given, we compute PA5^(x) for every a in parallel and prepare the state 
|-|Q|)®(2"-i) _|_ ^_-^-jORn{x) ^-^'^(^{2" -i)y ^ based on the representation. Applying an unbounded fan-out 
gate and a Hadamard gate to the state gives the desired state |ORn(x)). The point is that there exists 
an 0(l)-depth 0(|a|)-size quantum circuit for PA^ consisting of Hadamard gates and an unbounded 
fan-out gate as depicted in Fig. [2] [12] . 



5 



^o>— S © 0— l-^o) 




{h\ — I z ® x„ ® X[ 8 ) 
Figure 2: The quantum circuit for PA3 [12]. 

To describe the circuit for OR^ more precisely, let \x) = l^o) • • • be an input state. The 

circuit is described as follows: 

1. Copy the input state \x) and apply the circuit for PA° to each copy for every a G {0, 1}" \ {0"} 
in parallel to prepare the state ^ae{o i}"\{0"} |P^n(^))- 

2. Apply a Hadamard gate and an unbounded fan-out gate to ancillary qubits (initialized to |0)) to 
prepare the (2" - l)-qubit state (|0)®(2"-i) + |l)®(2"-i))/V2. 

3. Apply controlled-Z(7r/2"^^) gates in parallel to the states in Steps 1 and 2 to prepare the state 

|Q^®(2"-1) ^ gi'r^5rrTEae{0,l}"\{0"}PA^W|l^Cg>(2"-l) |Q^(g){2"-l) _^ (-_l)OR„(x)|j|)®{2"-l) 

where Lemma 1 implies the equation. 

4. Apply an unbounded fan-out gate and a Hadamard gate to the state in Step 3 to prepare the 
desired state |OR„(x)). 

For any < j < n — 1, let e(j) = eo • • • e„_i G {0, 1}" such that = 1 if /c = j and otherwise. 
In Step 1, since the input state \xj) = |PA^*--'^(2;)), it suffices to prepare the state |PA^(x)) for every 
a G {0, 1}" such that \a\ > 2. To prepare the states in parallel, we require the state Ixj)®^^" ^^^^ for 
any < j < n — 1. Thus, before applying the circuit for PA°, we apply an unbounded fan-out gate to 
the input qubit in state \xj) and 2"~^ — 1 ancillary qubits for every < j < n — 1 in parallel. In Step 2, 
we apply an unbounded fan-out gate to the ancillary qubits in state {H\0))\0)'^^'^"~'^\ In Step 3, we 
use the qubit in state |PA^(j;)) as the control qubit of the controlled-Z(7r/2"~-'^) gate. In Step 4, we 
first apply an unbounded fan-out gate to the state in Step 3 to disentangle the last 2" — 2 qubits and 
obtain the state (|0) + (— l)^^"(^')|l))/-v/2. Thus, the Hadamard gate outputs the desired state. By 
the construction, the depth of the whole circuit does not depend on n. Since Step 1 is the dominant 
part and uses n unbounded fan-out gates on 2"~^ qubits, the size of the whole circuit is 0(n2"). This 
implies the following lemma. The details of the proof are given in Appendix A. 2. 

Lemma 2 There exists an 0{l)-depth 0{n2^)-size quantum circuit for 0R„. 

Remark: Hoban et al. considered a restricted model of measurement-based quantum computation, 
where the adaptivity of measurements is removed p3]- They showed that, if we are allowed to use 
the (2"' — l)-qubit state in Step 2, any Boolean function /„ can be computed exactly in the model 
by the procedure based on the above-mentioned representation of /„. The circuit in Lemma 2 can be 
considered as a simulation of the procedure for computing OR^ in the model. The unbounded fan-out 
gates are mainly used for preparing the (2" — l)-qubit state and for computing PAJ^. 



3.2 Proof of Theorem 1 

We show Theorem 1 using H0yer and Spalek's OR reduction and Lemma 2. Let |x) = |xo) • • • \xn-i) 
be an input state. The circuit is described as follows: 
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1. Apply H0yer and Spalek's OR reduction to the input state \x) to prepare the m-qubit state 
<S>k=o H\^k), where m = \\og{n + 1)] . 

2. Apply the circuit in Lemma 2 to the state in Step 1 to prepare the desired state |OR„(x)). 

Since Step 1 exactly reduces the problem of computing OR^ to that of computing OR^ in depth 0(1) 
and with size O(nlogn), Step 2 outputs the desired state. Since the input size to Step 2 is m, the 
depth and size of the circuit in Step 2 are 0(1) and 0(m2™') = O(nlogn), respectively. Thus, the 
depth and size of the whole circuit are 0(1) and O(nlogn), respectively. This completes the proof. 
Theorem 1 immediately implies that OR^ is in QNCf and thus the following relationship holds: 

Corollary 1 QNCO = QAC^. 

Since QAC^ = QTC^ [15j, it holds that QNC^ = QAC^ = QTC^. Corollary 1 and the relationship 
QACf = QTCf immediately imply that QNCf = QACf and QACf = QTCf , respectively, for any integer 
/c > 0, where QNCf^ QACf^ and QTCf^ are defined similarly to QNCf, QACf, and QTCf , respectively, 
except that they deal with 0(\og^ n)-depth circuits in place of 0(l)-depth ones. Therefore, more 
generally, it holds that QNCf = QACf = QTCf for any integer A; > 0. 

For any integer constant c > 1, the size of the circuit in Theorem 1 can be decreased to 0{n \og^^^ n) 
without increasing the depth asymptotically, where log^'^^ n is the c-times iterated logarithm log • • • log n. 
To show this, we divide the n input qubits into n/ log n blocks of log n qubits. For each block, we apply 
the circuit in Theorem 1 to compute ORiogn- We obtain n/logn output qubits and apply the circuit 
again to the output qubits to compute OR„/iogn! which yields the desired output. The depth and size 
of the whole circuit are 0(1) and 0(nlog^^^ n), respectively. Using the resulting circuit, we repeat this 
size-reduction procedure. After c — 1 times repetition, we obtain an 0(n log^'^^ n)-size circuit. 

The circuit for OR^ yields a circuit for EX^ [15]. To construct the circuit, it suffices to prepare 
Z{—tiT /2^)\ip}^l in place of l^^^) in H0yer and Spalek's OR reduction and to negate the final output of 
the circuit in Theorem 1. This is done by only adding a Z{—t'K/2^) gate for every Q < k < m — 1 and 
a NOT gate. Thus, the depth and size of the resulting circuit are asymptotically the same as those in 
Theorem 1. This yields an 0(l)-depth 0(n log n)-size quantum circuit for EX^ for any < t < n. 

4 Circuit for the Threshold Function 

First, we describe a constant-depth circuit for TH^ based on the constant-depth circuits for EX^ 
described above. Then, we describe another constant-depth circuit for TH^ based on a circuit for the 
counting function. Next, we combine these two circuits to show Theorem 2. 

4.1 Exact-Function-Based and Counting- Function-Based Circuits 

We first consider a constant-depth circuit for TH^ based on the circuits for EX^ when 1 <t < \n/2\. 
Let |x) = |xo) • • • \xn-i) be an input state. The circuit is described as follows: 

1. Copy the input state |x) and apply the circuit for EX^ to each copy for every 0<A;<t — lin 
parallel to prepare the state <^\2q |EX^(x)). 

2. Apply the circuit for PAt and a NOT gate to the state in Step 1 to prepare the state 

ie*-^oEx^(^)©i)- 

If |x| > t, EX^(x) = for every < A; < t — 1. If < t, there exists exactly one < /c < t — 1 
such that EX^(x) = 1. Thus, the state in Step 2 is equal to the desired state |TH^(2;)). The depth 
and size of the circuit in Step 1 are 0(1) and 0(tn log n), respectively. As depicted in Fig. O the depth 
and size of the circuit for PA^ are 0(1) and 0(t), respectively. Thus, the depth and size of the whole 
circuit are 0(1) and 0(tn log n), respectively. When [n/2] < t < n, we modify the circuit in such a 
way that it prepares the state | ®fc=tEX^(x)) in Step 2. This implies the following lemma: 
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Lemma 3 There exist the following 0{l)-depth quantum circuits for TH^; 

• An O {tn log n)- size circuit for any 1 <t < \n/2]. 

• An 0{{n — t + l)nlogn)-size circuit for any \n/2] <t<n. 

When t is an integer constant, the size is O(nlogra). On the other hand, when t = \n/2\, in other 
words, for the majority function, the size is 0(n^ log n). 

We define the counting function on n bits, denoted as C0„, as CO„(x) = sq ■ ■ ■ Sm-i, where 
X G {0, 1}", Sj e {0, 1}, m = [log(n + l)], and |x| = YlY=o^ "^i^"'- computes the binary representation 
of the Hamming weight of the input. The quantum operation for computing C0„ is defined as 




where Xj, zj G {0, 1}. This operation is also denoted as C0„. 

We construct a constant-depth circuit for C0„. Let \x) be an input state. Since |x| = 

Yl'j^SQ Sj2^, \ipk) in H0yer and Spalek's OR reduction is (|0) + e'''^i=°^\l))/V2. This implies 
that \lpq) ■ ■ ■ l^^m-i) = F2'"|'So) ■ ■ ■ \srn-i)- Thus, to obtain the desired state |so) " " " \sm-i), it suffices 
to implement the following type of the inverse of the QFT: |x)(F2m|so) • • • |sm-i)) ^ \x)\so) ' ' ' I'Sm-i)- 
Our idea for implementing this operation is to perform A(0)-measurements on many |<y9fc)'s in parallel 
for appropriate 0's, where, for any G R, an A(0)-measurement is the one-qubit projective measure- 
ment in the basis (|0) + e*^|l))/\/2, (|0) — e*^|l))/\/2, which correspond to the classical outcomes 
and 1, respectively. The classical outcomes imply each exactly. 

For example, when m = 3, we first prepare the state \ip())\ipi)'^'^\ip2)^'^ with a slightly modified 
version of H0yer and Spalek's OR reduction, where 

|0) + e'"^o|l) , , |0) + e^'^(^i+^)|l) , , |0) + e^^("^+^+i^)|l) 
bo) = ^ , Ic^i) = ^ , = ^ . 

We can easily obtain sq since it is equal to the classical outcome Sq of an A(0)-measurement on \(po). 
The value si is determined depending on sq- When sq = 0, si is equal to the classical outcome s° of 
an ^(O)-measurement on \ipi). When sq = 1, si is equal to the classical outcome s\ of an A{tt/2)- 
measurement on |</?i). In other words, si = Similarly, we perform ^(0)-, A(7r/4:)-, A{tt/2)-, and 
A(37r/4)-measurements on |(^2)®^ and let ^2'^, 52*^, s^^, and s^^ be the classical outcomes, respectively. 
By the definition of the measurements, S2 = 82°^^ ■ These relationships imply 

si = [s?(l©0©s^)]©[sj(leles^)], 

S2 = [S2°(l©0©s^)(l©0©s;)]©[4°(l©l©s^)(l©0©si)] 

©[s^^(l © © sl){l © 1 © s?)] © [sl\l © 1 © sl){l © 1 © si)]. 

Thus, if we have sufficiently many copies of the classical outcomes, we can compute Sk for every 1 < fc < 
m — 1 in parallel using the circuits for ANDjt+i and PA2fc. We note that we can perform all the above 
measurements in parallel. We define the function tfc(y) on k bits as tk{y) = s'l Aj=o(l © Uj © ^j" ^^^^) 
for any y = yo - ■ ■ yt-i £ {0, l}'^, where the value sJ" is regarded as Sg when j = 0. It holds that 
si = ii(0) © ti{l) and S2 = t2(00) © i2(10) © t^i^l) © i2(ll). 

To describe the circuit for C0„ more precisely and generally, let |a;) = |a;o) • • • \xn-i) be an input 
state. The circuit is described as follows: 

1. Apply a slightly modified version of H0yer and Spalek's OR reduction to the input state \x) to 
prepare the state \(pk)®'^'^ ■ 

2. Perform A(0)- and A{it X]j=d 2F=j)"™6^s^J'6™6'^ts for every 1 <k <m—l and y = yo - ■ ■ yk-i € 
{0, 1}*^ in parallel on the state in Step 1 to obtain the values Sq, s^ G {0, 1} such that sq = Sq 
and Sfe = Sj, 
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3. Prepare 2^" — 1 copies of the state |sq) and 2"^~^ — 1 copies of the state |s^) and apply the 
circuit for AND^+i (constructed by the circuit for ORfc+i in Section 3) to the states for every 
1 < k < m — 1 and y G {0, 1}'^ in parallel to prepare the state (8)i<A:<m-i ye{o i}*^ I'^kiu))- 

4. Apply the circuit for PA2fe for every l<A;<m — lin parallel to the state in Step 3 to prepare 
the state |4) (g)i<fc<„_i | 0ye{o,i}fe tkiv))- 

Since tk{y) = Sfc if y = sq • • • Sk-i and otherwise for any 1 < k < m - 1, 0j,g{o,i}fc tk{y) = Sk- 
Thus, Step 4 outputs the desired state. By the construction, the depth of the whole circuit does not 
depend on n. Since Step 1 is the dominant part and the state in Step 1 can be prepared with a circuit 
of size 0{n Yl^=o '^^) — 0{n?), the size of the whole circuit is 0(n-^). This implies the following lemma. 
The details of the proof are given in Appendix A. 3. 

Lemma 4 There exists an 0{l)-depth 0{n?) -size QucLTituTfi civcuit fov CO^. 

Lemma 4 yields an 0(l)-depth 0(n^)-size quantum circuit for TH^. To construct the circuit, it 
suffices to add a circuit for comparing t with the output of the circuit for 00^. We can construct an 
0(l)-depth poly(m)-size quantum circuit for the comparison using the circuit for addition in [7]. 

4.2 Combination of the Two Circuits 

A careful combination of the circuits in Lemmas 3 and 4 yields a smaller circuit for TH^. We explain 
the idea in the case when 1 < t < [n/2] . When the input x is given, before using the first circuit in 
Lemma 3, we compute some low-order bits (not all the bits!) of the binary representation of |x| by the 
circuit in Lemma 4. Since we know the low-order bits, it is not necessary to check whether EX^(x) = 1 
for every 0<A;<t— lasin Lemma 3. It suffices to consider < k < t — 1 such that the low-order bits 
of the binary representation of k are equal to those computed by the circuit in Lemma 4. The number 
of fc's we need to consider is decreased and thus the size of the whole circuit can be decreased. 
More precisely, the circuit is described as follows: 

1. Apply the circuit in Lemma 4 to the input state |x) = |xo) • • • to prepare the state 
|so) • • • |s/_i), where sq • • • s/-i are the I low-order bits of the binary representation of |x| and 
/ is an integer satisfying < / < [log(t -|- 1)] . 

2. Apply the first circuit in Lemma 3 to the input state \x) to prepare the state | 0^ F,X^{x) © 1), 
where we consider only < k < t — 1 such that the / low-order bits of the binary representation 
of k are equal to sq • • • s^-i. 

Step 2 outputs the desired state as in Lemma 3. It is obvious that the depth does not depend 
on n. The size of the circuit in Step 1 is 0{2^n) and that in Step 2 is 0(2~^tnlogn) since there are 
at most 2~h fc's we need to consider. The same idea with the second circuit in Lemma 3 works when 
[n/2] < t < n. This implies the following lemma. The details of the proof are given in Appendix A. 4. 

Lemma 5 There exist the following 0{l)-depth quantum circuits for TH^; 

• An 0(2'n + 2~Hnlogn)-size circuit for any 1 < t < [n/2] and < / < [log(t -|- 1)] . 

• An 0(2'n -|- 2~'(n — t + l)nlogn + n log n) -size circuit for any [n/2] < t < n and < I < 
[log(t + 1)] . 

By setting / appropriately depending on t, Lemma 5 implies Theorem 2. The proof is given in 
Appendix A. 5. The size of the circuit for Th|i"^^^ in Lemma 3 is O(n^logn) and it can be decreased 
to 0(n-^) by Lemma 4. Theorem 2 with t = [n/2] yields an even smaller circuit: 

Corollary 2 There exists an 0{l)-depth O {riy/n log n) -size quantum circuit for TbII^^'^\ 
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5 Discrete Logarithm Algorithm Using a QNC^ Oracle 

Let g > 5 be a safe prime, i.e., a prime of the form q = 2p -\- 1 for some prime p > 2. In the 
following, as in the cryptographic literature, we assume that there exist infinitely many safe primes. 
Let Gq = (Z/qZ)*, the multiplicative group of integers modulo q. It is known that there exists a 
generator 1 < gg < q — 1 of Gg and thus Gg = {gg = l,gg, . . . , gq~'^} and gq~^ = 1 mod q. The discrete 
logarithm problem (DLP) over Gg (with respect to given q and gg) is to find < < g — 2 such that 
gli = Xg mod q for an input Xg ^ Gg, where the problem size is n = [logg] and the order of Gg, i.e., 
q — 1 and its decomposition 2p are known. Since it seems difficult to reduce the DLP over Gg to DLP's 
over groups of sufficiently small orders, it is plausible that it cannot be solved by a polynomial-time 
bounded-error classical algorithm, in other words, that the DLP over Gg is classically hard. 

Although we can directly consider the DLP over Gg, for simplicity, we consider simpler DLP's 
obtained by the reduction method in [20]. Since the order of Gg is 2p and gcd(2,p) = 1, the DLP 
over Gg with an input Xg can be reduced to the following two DLP's by a poly(n)-time exact classical 
algorithm. One is the DLP over the group generated by gg with the input Xq, which is solvable by a 
poly(n)-time exact classical algorithm since the order of gg is 2. The other is the DLP over the group 
G generated hy g = gg with the input x = Xg. Thus, to show Theorem 3, it suffices to show that, if Fp 
is in QNCf , there exists a poly(n)-time exact classical algorithm for the DLP over G using the QNC^ 
oracle, which solves, in classical constant time, a problem that is solvable exactly by a QNCf circuit. 

We analyze (a slightly modified version of) van Dam's exact algorithm for the DLP [23], which 
consists of two parts. The first part is independent of the input x £ G and transforms the state 
|Q^(g)(m+n) -j^^Q i-j^g state -^/^TfYl^=i\^)\x^) follows, where m = [logp], the n-qubit state Ix'') = 

J2r=o ^p'ld^ mod q) for any < s < p — I, and cOp = e'^^^^^: 

1. Apply Fp to the first m qubits of the state |0)®(™+") to prepare the state Ylr=l k)|0)®". 

2. Apply the modular exponentiation operation |r)|0) — )• \r)\g'^ mod q) to the state in Step 1 to 
prepare the state Ylr=o \^)\9^ mod q). 

3. Apply Fp to the first m qubits of the state in Step 2 to prepare the state X]s=o 

4. Apply the amplitude amplification procedure to prepare the state ■^/=f Ss=i l'5)|x'^)- 

Steps 1 and 3 are in QNC^ by our assumption. Since g^ = YI^=q g'^^^^ mod q when r = "^^=0 ^"'''i 
and rj € {0,1}, the modular exponentiation operation in Step 2 can be implemented by using the 
iterated multiplication operation with the values g^^ mod q that can be pre-computed by a poly(n)- 
time exact classical algorithm [8]. It holds that QNC^ = QTCj* as shown in Section 3 and QTC^ includes 
arithmetic operationqj such as the iterated multiplication operation [22]. Thus, Step 2 is in QNCf . 

The procedure in Step 4 is similar to the one in P]. We define the algorithm A' as Steps 1, 2, 
and 3, and the good state \A') = X]s=i l'5)lx'*)- Since {A'\A') = 1 — 1/p, it is easy to transform A' 
into a new algorithm A with success probability 1/2 using one ancillary qubit. Thus, we require only 
one application of a Grover iteration with A. The Grover iteration includes operations that change the 
phases of the states \0)^^ with some k < m + n + 1. These operations can be implemented by using 
ORn, which is in QNC^ as shown in Section 3. Thus, the whole procedure in Step 4 is in QNC^. 

For the input x = Xg = g^ G G (0 < / < p — 1), the second part of van Dam's exact algorithm 
transforms the state Yl^s=i l'5)lx*)|0)®™ into the state -^^y Z]s=i l'5)|x**)|s/ mod p) as follows: 

5. Apply Fp to the last m qubits to prepare the state Yl^=i \^)\x^) (^-^ X^q=o I*^))- 

■^To show this, we need to show that the "weighted" threshold gates are in QTCj". We can simply show this as in [15) . 
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6. Apply : |y)|Q:) i-^ \y ■ x " mod q)\a) {0 < y < q — I, < a < p — 1) to the last n + m 



One-qubit projective measurements in the basis |0), |1) on the state in Step 7 yield the classical out- 
comes s and si mod p for some 1 < s < p — 1. Since gcd(s,p) = 1, we can compute si ■ mod p = I, 
which is the desired result, by a poly(n)-time exact classical algorithm. Steps 5 and 7 arc in QNC^ by 
our assumption. Step 6 is in QNC^ since, as in Step 2, can be implemented by using arithmetic op- 
erations with the pre-computed values mod q and mod q. This analysis implies Theorem 3. 
The details of the proof are given in Appendix A. 6. 

As described above, the (relation) problem of finding s and si mod p for some 1 < s < p — 1 for 
the input x = £ G with the pre-computed values can be solved exactly by the QNC^ circuit with 
gates for Fp. On the other hand, the problem is classically hard under the plausible assumption that 
the DLP over Gq is classically hard, since otherwise we can easily show that the plausible assumption 
docs not hold. Thus, under the plausible assumption, there exists a classically hard problem that is 
solvable exactly by a QNCf circuit with gates for Fp. 

6 Open Problems 

Interesting challenges would be to find ways of improving our quantum circuits and to further study 
the relationships between the complexity classes. We give some examples of such problems: 

• Does there exist an 0(l)-depth 0(n)-size exact or approximate quantum circuit for 0R„? 

• Does there exist an 0(l)-depth 0(n log n)-size exact quantum circuit for TH^ for any l<t<n! 

• Does it hold that Fp is in QNC^? 

• The classes QAC° and QTC° are defined similarly to QACf and QTC^ , respectively, except that 
unbounded fan-out gates are not allowed. Does it hold that QAC'^ C QACf or QTC° C QTCf ? 

• Does there exist a fundamental gate that is as powerful as an unbounded fan-out gate? 
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A Proofs 

A.l Proof of Lemma 1 

We show this lemma by induction on n (without using the Fourier expansion of OR^ exphcitly). 
It is obvious that the lemma holds when n = 1. We assume that it holds when n = k. For any 

1 E PK+iix) = ^ E PKixo-'-xk-i) 

oe{o,i}'=+i\{o'=+i} oe{o,i}'=\{o'=} 

+5 + ^ E {'PAl{xo---Xk_i)exk) 

a6{0,l}'=\{0'=} 

^ r ORjfc(xo---Xjk_i), ifxfe = 0, 
1^ 1, otherwise, 

where the induction hypothesis implies the second equation. The value is equal to OKk^i{xo ■ ■ ■ x^). 
Thus, when n = k + 1, the lemma holds as desired. 

A. 2 Proof of Lemma 2 

Let |x) = \xq) ■ ■ ■ \xn-i) be an input state. As described in Section 3.1, we prepare the states 
|^^.^®(2"-i-i) fQj. j^j^y < j < n-1, |PA^(x)) for any a G {0, 1}" such that \a\ > 2, and the (2"-l)-qubit 

|0)«'(2"-l) + |1)®(2"-1) 

V2 ■ 

Thus, we prepare the registers Rj for storing the state \xj)^^'^" ^^^"^ for any 0<j<n — 1, S for 
storing all the states |PA^(x)), and T for storing the (2" — l)-qubit state. All the registers consist of 
qubits initialized to |0). The numbers of qubits in Rj, S, and T are 2"~^ — 1, 2" — n — 1, and 2" — 1, 
respectively. The circuit is described as follows: 

1. Copy the input state \x) and apply the circuit for PA^ to each copy for every a G {0, 1}" \ {0"} 
in parallel. 

(a) For each < j < n — 1: 

Apply an unbounded fan-out gate to the input qubit in state \xj) and all the qubits in Rj, 
where the input qubit is used as the control qubit. 

(b) For each < j < n — 1 : 

Apply Hadamard gates to all the qubits in Rj. 

(c) Apply Hadamard gates to all the qubits in S. 

(d) For each a = gq - ■ ■ a„_i G {0, 1}" such that \a\ > 2: 

Apply an unbounded fan-out gate to a qubit in Rj^, . . . , a qubit in Rj^^^_^, and a qubit in 
S, where the qubit in S is used as the control qubit and jo, ■ ■ ■ ,j\a\-i is a unique sequence 
of the non-negative integers satisfying a^Q = • • • = aj|^|_^ = 1 and jo < • • • < i|a|-i- All the 
gates and the qubits are arranged so that all the gates can be applied in parallel. 

(e) This step is the same as Step l-(b). 

(f) This step is the same as Step l-(c). 

2. Apply a Hadamard gate and an unbounded fan-out gate to ancillary qubits. 

(a) Apply a Hadamard gate to a qubit in T. 

(b) Apply an unbounded fan-out gate to all the qubits in T, where the qubit to which a 
Hadamard gate is applied in Step 2- (a) is used as the control qubit. 
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Figure 3: The circuit for OR3. 



3. Apply controlled-Z(7r/2" ^) gates in parallel to the states in Steps 1 and 2. 

(a) For each < j < n — 1: 

Apply a controlled-Z(7r/2"~^) gate to the input qubit in state \xj) and a qubit in T. 

(b) For each qubit in S: 

Apply a controlled-Z(7r/2"~^) gate to the qubit in S and a qubit in T. 

All the gates and the qubits are arranged so that all the gates can be applied in parallel. 

4. Apply an unbounded fan-out gate and a Hadamard gate to the state in Step 3. 

(a) This step is the same as Step 2-(b). 

(b) This step is the same as Step 2- (a). 

The circuit for n = 3 is depicted in Fig. [3l 

The correctness of the circuit is described as follows. Step l-(a) transforms the state of Rj into 
the state Since PA^{x) can be computed by a combination of Hadamard gates and 

an unbounded fan-out gate as depicted in Fig. [2l Step l-(f) stores the state |PA^(x)) in 5 for any 
a G {0,1}'^ such that \a\ > 2. Step 2-(a) prepares the state (iJ|0))|0)®(2"-2) ^nd thus Step 2-(b) 
transforms the state of T into the (2" — l)-qubit state 

|0)®(2"-1) + |1)C>5(2"-1) 

Step 3 transforms the (2" — l)-qubit state into 

|Q^(g)(2"-l) _^ g*'r^j^EaG{0,i}"\{o"}PA^(^)|X)®(2"-l) 

which is equal to 

|0)®(2"-1) + (_l)OR„{a;)|^|)C>5(2"-l) 

by Lemma 1. Since Step 4- (a) yields the state 

|0) + (-l)ORn(x)|l^ 
71 ' 



14 



Step 4-(b) outputs the desired state |OR„(x)). 

By the construction, the depth of the whole circuit does not depend on n. Since Step l-(a) is the 
dominant part and uses n unbounded fan-out gates on 2"~^ qubits, the size of the whole circuit is 
0(n2"). Thus, the depth and size of the whole circuit are 0(1) and 0(n2"), respectively. 

A. 3 Proof of Lemma 4 

Let |x) = |a;o) • • • \xn~i) be an input state. As described in Section 4.1, we prepare the (2™" — l)-qubit 
state <^^''^Q Iv^fc)®^ , 2™- — 1 copies of the state \sq) and 2™^''' — 1 copies of the state |s|), and the 
state \tk{y)) for any I < k < m — 1 and y G {0, l}'^. Thus, we prepare the registers R for storing the 
(2"* — l)-qubit state, Sq for storing the copies of the state \sq), for storing the copies of the state 
|s^) for any 1 < k < m — l and y G {0, l}'^, Tq for storing the state |sg), and Tk for storing all the states 
\tk{y)) for any 1 < k < m — 1. All the registers consist of qubits initialized to |0). The numbers of 
qubits in R, Sq, S^, and are 2™ — 1, 2™ — 1, 2"^~^ — 1, and 2^, respectively. The circuit is described 
as follows: 

1. Apply a slightly modified version of H0yer and Spalek's OR reduction to the input state \x), 
where the output is stored in R. 

2. Perform A{0)- and A{Tr Yl^Zo ^^)-Toaeasuj:em.ents for every I < k < m — 1 and y = yo - ■ ■ yk-i £ 
{0, l}'^' in parallel on the state of R. 

(a) Perform an 74(0)-measurement on the state |</?o) of R and let Sq be the classical outcome of 
the measurement. 

(b) For each 1 < k <m — \ and y = yo ■ ■ ■ yu-i £ {0, 1}^: 

Perform an ^(7r^j~Q 2^)-nieasurement on the state \ipk) of R and let s\ be the classical 
outcome of the measurement. 

3. Prepare 2"* — 1 copies of the state |sq) and 2'^~^ — 1 copies of the state |sp and apply the 
circuit for AND^+i (constructed by the circuit for ORfe+i in Section 3) to the states for every 
1 < k <m — 1 and y G {0, 1}'' in parallel. 

(a) Apply NOT gates to all the qubits in SqU Sq = \. 

(b) For each 1 < k < m — 1 and y = 2/o ' ' ' yk-i £ {0, l}'^: 
Apply NOT gates to all the qubits in \i s\ = 1. 

(c) Apply a CNOT gate to a qubit in Sq and the qubit in Tq, where the qubit in Sq is used as 
the control qubit. 

(d) For each 1 < k < m — 1 and y = yo " " " yfc-i £ {0, 1}^: 

• Apply a NOT gate to a qubit (not used in Step 3-(c)) in Sq if yo = 0, a NOT gate to a 
qubit in Sf if yi = 0,. . . , and a NOT gate to a qubit in S^!;^'""^ if Vk-i = 0- All the 
gates and the qubits are arranged so that all the gates can be applied in parallel. 

• Apply a gate for AND^.+i to the qubit in ^g, the qubit in S'f , . . . , the qubit in 5^'^-^^''"^, 
a qubit in 5^, and a qubit in T^, where the output is stored in Tk. All the gates and 
the qubits are arranged so that all the gates can be applied in parallel. 

4. Apply the circuit for PA2fc for every l<A;<m — lin parallel to the state in Step 3. 

(a) For each 1 <k < m — 1: 

Apply Hadamard gates to all the qubits in T^. 

(b) For each 1 < A; < m — 1: 

Apply an unbounded fan-out gate to all the qubits in T^- 

(c) This step is the same as Step 4- (a). 



15 



0) 



1 


► 








► 






























—i 


» 














-A 


\— 




—t 


t- 




















y- 




























4 






— « 


\- 












-< 






















\— 
























-i 






















^— 






















^- 












\— 




















^- 














y- 










V. 












y- 

r 








y- 
-< 


y- 



-l'l(<") 

-|r,(00)> 
-I'liOi )) 



Figure 4: The circuit for Steps 3-(c) and 3-(d) for m = 3. 



The circuit for Steps 3-(c) and 3-(d) for m = 3 is depicted in Fig. HI The first half of the whole circuit 
contains many one-qubit projective measurements and unitary operations depending on the classical 
outcomes of the measurements. We can replace them with unitary operations including controlled 
operations and with measurements in the computational basis only at the end of the circuit by using 
the well-known method of coherently implementing measurements [18j . 

The correctness of the circuit is described as follows. Step 1 transforms the state of R into the 



state 



k=0 \fk) 



Step 2 yields the values Sq, Sf^ E {0, 1}. By the definition of the measurements, it 



holds that sq = Sn and Sk = s 



S0---Sfc-1 



of Sq into 



and the state of into 



for any 1 < k < m 



3-(d) transform the state of Tq into |sq 
the state of a qubit in Tk into | 0yg|o i}fc 
ye{0,l}^ 



- 1. Steps 3- (a) and 3-(b) transform the state 
''^^'^ for any 1 < k < m — 1. Steps 3-(c) and 
and the state of into (S)ye{o i}*: I'^kiv))- Step 4 transforms 
tk{y)) for any 1 < k < m — 1. For any 1 < k < m — 1 and 



k-l 



tk{y) 



Ad 



j=o 



k-l 
j=0 



k-l 



so- 



j=0 



and thus 



tk{y) 



tk{y) = 

Sk for any 1 < k < m 



Sk, if y = Sq- ■ ■ Sk-l, 

0, otherwise. 



Therefore, 0j^g{o,i}fc 
any 1 < k < m — 1. 

By the construction, the depth of the whole circuit does not depend on n. 



1. Thus, Step 4 outputs the desired state \sk) for 

Since Step 1 is 



the dominant part and the state ^^^q l^k)^'^'' in Step 1 can be prepared with a circuit of size 



m-l r,k\ 
k=0 



0{n ) as in H0yer and Spalek's OR reduction, the size of the whole circuit is 0{n 



Therefore, the depth and size of the whole circuit are 0(1) and 0(n ), respectively. 



A. 4 Proof of Lemma 5 

Let t be an integer satisfying 1 < t < \n/2\ and |x) = |2;o) " " " l^^n-i) be an input state. Let I be 
an integer satisfying < / < [log(t + 1)]. This means that / is less than the length of the binary 
representation of t. Let t^ - ■ ■ ti-i be the I low-order bits of the binary representation of t, where to is 
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the lowest-order bit. Note that the value t — X^j=o*j2-? is positive and is a multiple of 2K The first 
circuit is described as follows: 

1. Apply the circuit in Lemma 4 to the input state \x), where we regard m in the proof of Lemma 4 
as I. Let |so) • • • be the output. In other words, sq • • • are the I low-order bits of the 
binary representation of where sq is the lowest-order bit. 

2. Apply the first circuit in Lemma 3 to the input state \x), where we consider only < A; < t — 1 
such that the / low-order bits of the binary representation of k are equal to sq • • • Si-i- More 
concretely, 

l-l 

k = M2^ Sj2^ 

j=0 



for any integer M satisfying 



ifEtoii2^>Eto^i2^and 



< M < ^ ^yp^^-^' 

- - 2' 



2' 

otherwise. 

We note that, before Step 2, we prepare all the binary representations of k satisfying the above 
conditions by applying unbounded fan-out gates and NOT gates to ancillary qubits (initialized to |0)). 

As in the proof of Lemma 3, the circuit outputs the desired state |TH^(x)) and the depth of 
the whole circuit does not depend on n. The sizes of the circuits in Steps 1 and 2 are 0(2^n) and 
0(2^'tnlogn), respectively, since M < t/2K Thus, the depth and size of the whole circuit are 0(1) 
and 0{2^n + 2~Hn\ogn), respectively. To construct the second circuit, we use the second circuit in 
Lemma 3, where we consider only t < k < n such that the I low-order bits of the binary representation 
of k are sq • • • The number of /c's we need to consider is bounded above by {n — t + l)/2' -|- 2 and 
thus the depth and size of the resulting circuit are 0(1) and 0{2^n + 2~\n — t + l)rilogn -|- nlogn), 
respectively. 



A. 5 Proof of Theorem 2 

For any 1 < t < logn, it holds that < [log(t + l)] - 1 < [log(i + l)] and thus we set I = [log(t + l)] - 1 
in the first circuit in Lemma 5. This yields an 0(nlogn)-size circuit. For any logn < t < [ra/2], it 
holds that < [log ^/tTogn\ — 1 < [log(t + 1)] and thus we set I = [log A/HogrT] — 1 in the first circuit 
in Lemma 5. This yields an 0{n^Jt logn)-size circuit. For any [n/2] < t < n — logn, it holds that 
< [log y/{n — t + 1) log n] — 1 < [log(t + 1)] and thus we set I = [log y/{n — t + l) log n] — 1 in the 
second circuit in Lemma 5. This yields an 0(ni/(n — log n)-size circuit. For any n — logn < t < n, 
it holds that < [log(n - t + 2)] - 1 < [log(t + 1)] and thus we set / = [log(n -t + 2)] -1 in the 
second circuit in Lemma 5. This yields an 0(n log n)-size circuit. 



A.6 Proof of Theorem 3 

As described in Section 5, it suflaces to show that, if Fp is in QNCf , there exists a poly(n)-time exact 
classical algorithm for the DLP over G using the QNCf oracle. We consider a slightly modified version 

of van Dam's exact algorithm for the DLP. The main difference is that the slightly modified version does 
not include intermediate measurements. This allows us to consider an exact algorithm with a simple 
structure: a poly(n)-time classical pre-processing, a query to the QNCf oracle, and a poly(n)-time 
classical post-processing. 

Let X = (7' G G (0 < / < p — 1) be an input. In the classical pre-processing step, we compute the 
values g'^' mod q, x^^ mod q, and {x~^Y^ mod (? (0 < j < m — 1) by a poly(n)-time exact classical 
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algorithm. By a query to the QNCf oracle, we solve the problem of finding s and si mod p for some 
1 < s < p — 1 using the pre-computed values. In the classical post-processing step, using the values 
s and si mod p obtained from the QNC^ oracle, we compute si ■ s~^ mod p = I, which is the desired 
output, by a poly(n)-time exact classical algorithm. This can always be done since gcd(s,p) = 1 for 
any 1 < s < p — 1. Thus, the only problem is to show that the QNC^ oracle can solve the problem, in 
other words, to show that the problem can be solved exactly by a QNCf circuit (if Fp is in QNC^). 

The quantum algorithm for solving the problem consists of two parts Qi and Q2- We note that 
we can use the pre-computed values descried above in the quantum algorithm. The first part Qi 
transforms the state |o)'^('"+'^+i) into the state 

1 

which is independent of the input x. To define Qi, we define the following algorithm as A, where the 
input state is |0)®("^+"+i): 

1. Apply Fp to the first m qubits of the input state. 

2. Apply the modular exponentiation operation |r)|0) — t- \r)\g'^ mod q) to the state in Step 1. 

3. Apply Fp to the first m qubits of the state in Step 2. 

4. Apply the one-qubit unitary operation defined by 



1 ( Vp^ -Vp 
V2(p - 1) V Vp Vp^ 

to the last one qubit of the state in Step 3. 
A direct calculation shows that A transforms the input state into 
P-i 




^) .=0 



We define 



p-i 



\A) = —^^=y\s)\x' 



s=l 



It holds that (^|^) = 1/2. Let 5*10} be the quantum operation that changes the phase of a state by 
i if and only if the state is |o)®(™'+"+^) . Similarly, let Sa be the quantum operation that changes the 
phase of a state by i if and only if the state of the first m qubits is not lO)®"* and the state of the 
last one qubit is |1). We define the Grover iteration G = AS^q^A'^Sa and the first part Qi = GA. 
The correctness of Qi follows from the direct calculation as in the amplitude amplification procedure 
in [HIS]. 

The argument in Section 5 implies that ^ is in QNCf . Moreover, by Theorem 1, S^Qj and Sa are in 
QNCf . Thus, Qi is in QNC^ . We note that the last qubit in state |1) is not important for the second 
part Q2 describe below (and thus can be ignored below) and that the pre-computed values used in 
Step 2 on ancillary qubits have no effect on the amplitude amplification procedure. 

Recall that the quantum operation is defined as 

\y)\a) ^ \y • x"" mod q)\a), 
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where < y < q — 1 and < a < p — 1. Before considering the second part Q2, we show that the 
relationship 

D^\x')\a)=uj;'^\x')\a) 

holds for any < s < p — 1 and 0<a<p— Ihy the following direct calculation: 

^ p-i ^ p-i 

Dx\x')\a) = —J2^P^^W^odq)\a) = —J2^p\9'"-x~'^^odq)\a) 
VP r-=n VP 



r=0 



r=0 



p-1 



= j;a;f-'")|/-'° mod q)\a) = <"|x^)|a). 

VP r=0 

We consider the second part Q2 that transforms the input state 

p-i 



l==Y.\s)\xi\or"', 



Vp 

which is obtained by Qi with m qubits initialized to |0), into the state 

1 



s=l 



We define the following algorithm as Q2- 

5. Apply Fp to the last m qubits of the input state. 

6. Apply Dx to the last n + m qubits of the state in Step 5. 

7. Apply to the last m qubits of the state in Step 6. 

The correctness of Q2 is described as follows. Step 5 transforms the input state into the state 



Vp 



, P-i / . P-i 



a) 



=1 \ Q=0 / 

By the relationship shown above, Step 6 transforms the state in Step 5 into the state 



Vp 



, p-i / -I p-i 



a) 



a=0 

Step 7 transforms the state in Step 6 into the desired state 

P-i 



=j^\s)\x'')\sl mod p). 

s=l 



Vp 

We perform one-qubit projective measurements in the basis |0), |1) on the first m qubits and the last m 
qubits of the state in Step 7. This yields the classical outcomes s and si mod p for some 1 < s < p — 1. 

Steps 5 and 7 are in QNC^ by our assumption. In Step 6, as in Step 2 of A, is implemented by 
using the iterated multiplication operation with the pre-computed values (x^^ mod q and mod q) 

and the modular multiplication operation as follows: 

\y)\a)\or'- 





\y) 


a)|2;"" mod g)|0)®" 




\y) 


a)\x~°' mod q)\y ■ x~" mod q) 


1-^ 


\y) 


a)|0)®"|y -x"" mod q) 


H> 


\y) 


a) x" mod q)\y ■ x~" mod q) 


H> 


|0) 


^" a) x"^ mod q)\y ■ x~" mod q 


1-^ 


|0) 


^"|a)|0)®'^|y -x-" mod q). 



Since QNC^ = QTC^ as shown in Section 3 and QTCf includes the iterated multiplication operation 
and the modular multiplication operation [22], Step 6 is in QNC^ . Therefore, Q2 is in QNCf . 
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